45 research outputs found

    Universal Approximation of Markov Kernels by Shallow Stochastic Feedforward Networks

    Full text link
    We establish upper bounds for the minimal number of hidden units for which a binary stochastic feedforward network with sigmoid activation probabilities and a single hidden layer is a universal approximator of Markov kernels. We show that each possible probabilistic assignment of the states of nn output units, given the states of kβ‰₯1k\geq1 input units, can be approximated arbitrarily well by a network with 2kβˆ’1(2nβˆ’1βˆ’1)2^{k-1}(2^{n-1}-1) hidden units.Comment: 13 pages, 3 figure

    Hierarchical Models as Marginals of Hierarchical Models

    Full text link
    We investigate the representation of hierarchical models in terms of marginals of other hierarchical models with smaller interactions. We focus on binary variables and marginals of pairwise interaction models whose hidden variables are conditionally independent given the visible variables. In this case the problem is equivalent to the representation of linear subspaces of polynomials by feedforward neural networks with soft-plus computational units. We show that every hidden variable can freely model multiple interactions among the visible variables, which allows us to generalize and improve previous results. In particular, we show that a restricted Boltzmann machine with less than [2(log⁑(v)+1)/(v+1)]2vβˆ’1[ 2(\log(v)+1) / (v+1) ] 2^v-1 hidden binary variables can approximate every distribution of vv visible binary variables arbitrarily well, compared to 2vβˆ’1βˆ’12^{v-1}-1 from the best previously known result.Comment: 18 pages, 4 figures, 2 tables, WUPES'1

    Mixtures and products in two graphical models

    Full text link
    We compare two statistical models of three binary random variables. One is a mixture model and the other is a product of mixtures model called a restricted Boltzmann machine. Although the two models we study look different from their parametrizations, we show that they represent the same set of distributions on the interior of the probability simplex, and are equal up to closure. We give a semi-algebraic description of the model in terms of six binomial inequalities and obtain closed form expressions for the maximum likelihood estimates. We briefly discuss extensions to larger models.Comment: 18 pages, 7 figure

    Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines

    Full text link
    We improve recently published results about resources of Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) required to make them Universal Approximators. We show that any distribution p on the set of binary vectors of length n can be arbitrarily well approximated by an RBM with k-1 hidden units, where k is the minimal number of pairs of binary vectors differing in only one entry such that their union contains the support set of p. In important cases this number is half of the cardinality of the support set of p. We construct a DBN with 2^n/2(n-b), b ~ log(n), hidden layers of width n that is capable of approximating any distribution on {0,1}^n arbitrarily well. This confirms a conjecture presented by Le Roux and Bengio 2010

    When Does a Mixture of Products Contain a Product of Mixtures?

    Full text link
    We derive relations between theoretical properties of restricted Boltzmann machines (RBMs), popular machine learning models which form the building blocks of deep learning models, and several natural notions from discrete mathematics and convex geometry. We give implications and equivalences relating RBM-representable probability distributions, perfectly reconstructible inputs, Hamming modes, zonotopes and zonosets, point configurations in hyperplane arrangements, linear threshold codes, and multi-covering numbers of hypercubes. As a motivating application, we prove results on the relative representational power of mixtures of product distributions and products of mixtures of pairs of product distributions (RBMs) that formally justify widely held intuitions about distributed representations. In particular, we show that a mixture of products requiring an exponentially larger number of parameters is needed to represent the probability distributions which can be obtained as products of mixtures.Comment: 32 pages, 6 figures, 2 table
    corecore